Exploiting Sentence Similarities for Better Alignments

نویسندگان

Tao Li

Vivek Srikumar

چکیده

We study the problem of jointly aligning sentence constituents and predicting their similarities. While extensive sentence similarity data exists, manually generating reference alignments and labeling the similarities of the aligned chunks is comparatively onerous. This prompts the natural question of whether we can exploit easy-to-create sentence level data to train better aligners. In this paper, we present a model that learns to jointly align constituents of two sentences and also predict their similarities. By taking advantage of both sentence and constituent level data, we show that our model achieves state-of-the-art performance at predicting alignments and constituent similarities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reliable Measures for Aligning Japanese-English News Articles and Sentences

We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignm...

متن کامل

MUTT: Metric Unit TesTing for Language Generation Tasks

METEOR a metric that computes soft similarities between sentences by computing synonym and paraphrase scores between sentence alignments SICK+: Since SICK is for compositional semantics, all sentences have proper grammar. We automatically generated ungrammatical sentences (without human-estimated scores) to supplement the existing sentence pairs. Dataset Case Study: SICK: We examine how well hu...

متن کامل

Parallel Seed-Based Approach to Multiple Protein Structure Similarities Detection

Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a qua...

متن کامل

DLS$@$CU: Sentence Similarity from Word Alignment and Semantic Vector Composition

We describe a set of top-performing systems at the SemEval 2015 English Semantic Textual Similarity (STS) task. Given two English sentences, each system outputs the degree of their semantic similarity. Our unsupervised system, which is based on word alignments across the two input sentences, ranked 5th among 73 submitted system runs with a mean correlation of 79.19% with human annotations. We a...

متن کامل

Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems. Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance. In this paper, we focus on further improving the performance of the reordering model (and thereby machine transla...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Exploiting Sentence Similarities for Better Alignments

نویسندگان

چکیده

منابع مشابه

Reliable Measures for Aligning Japanese-English News Articles and Sentences

MUTT: Metric Unit TesTing for Language Generation Tasks

Parallel Seed-Based Approach to Multiple Protein Structure Similarities Detection

DLS$@$CU: Sentence Similarity from Word Alignment and Semantic Vector Composition

Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

عنوان ژورنال:

اشتراک گذاری